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Abstract 

Background: Phase transition widely exists in the biological world, such as transformation of cell cycle phases, cell 
differentiation stages, disease development, and so on. Such a nonlinear phenomenon is considered as the 
conversion of a biological system from one phenotype/state to another. Studies on the molecular mechanisms of 
biological phase transition have attracted much attention, in particular, on different genotypes (or expression 
variations) in a specific phase, but with less of focus on cascade changes of genes' functions (or system state) 
during the phase shift or transition process. However, it is a fundamental but important mission to trace the 
temporal characteristics of a biological system during a specific phase transition process, which can offer clues for 
understanding dynamic behaviors of living organisms. 

Results: By overcoming the hurdles of traditional time segmentation and temporal biclustering methods, a causal 
process model (CPM) in the present work is proposed to study the biological phase transition in a systematic 
manner, i.e. first, we make gene-specific segmentation on time-course expression data by developing a new 
boundary gene estimation scheme, and then infer functional cascade dynamics by constructing a temporal block 
network. After the computational validation on synthetic data, CPM was used to analyze the well-known Yeast cell 
cycle data. It was found that the dynamics of the boundary genes are periodic and consistent with the phases of 
the cell cycle, and the temporal block network indeed demonstrates a meaningful cascade structure of the 
enriched biological functions. In addition, we further studied protein modules based on the temporal block 
network, which reflect temporal features in different cycles. 

Conclusions: All of these results demonstrate that CPM is effective and efficient comparing to traditional methods, 
and is able to elucidate essential regulatory mechanism of a biological system even with complicated nonlinear 
phase transitions. 




Introduction 

In the biological world, a phase transition can be 
defined as the transformation of a biological system 
from one phenotype or state to another, where different 
phenotypes can be mapped to distinct states. For exam- 
ple, cell cycle is known to have four distinct phases: Gl, 
S, G2 and M phases; cell differentiation contains differ- 
ent stages like cell proliferation, growth arrest and 
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mature differentiation; and cancer development mainly 
involves three steps as mutation, promotion and inva- 
sion. Obviously, analysing those biological phase transi- 
tions will offer valuable clues for understanding life and 
its dynamics. Therefore, a fundamental but important 
question is how to trace the temporal characteristics or 
dynamics of a biological system during a particular 
phase transition process. 

The study on molecular mechanism of biological 
phase transition has attracted much attention [1-4]. For 
instance, by modulating the intracellular redox state and 
measuring cell cycle progression, the redox cycle within 
the (mammalian) mouse embryonic fibroblast cell cycle 
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was found to maintain the metabolic processes early in 
Gl and activate Gl -regulatory proteins ahead of entry 
into S phase [1]. For a well known agricultural pest as 
migratory locust with a phase transition from the soli- 
tary to the gregarious, many down-regulated and some 
up-regulated genes were found in various organs when 
arriving to gregarious phase [2], which provides molecu- 
lar indicators and recovers genetic mechanisms of phase 
transition in locusts. To determine the dormancy status 
of raspberry buds whose developmental regulation is 
helpful to promote the economic values of fruit and 
horticultural industries, a few significant dormancy- 
related candidate genes for raspberry buds had been 
identified by principal component analysis on clones' 
expressions [5]. Generally speaking, these research 
works are mainly on the different genotypes or expres- 
sion variations at the level of individual genes under 
specific phases. Despite of those progresses, however, 
there is much less of focus on studying cascade changes 
or sequential dynamics of genes' or modules' functions 
at the level of networks during phase transition process. 

As well known to us, one gene generally has multiple 
roles in biological processes but what role at a specific 
time is still unclear. Thus, identifying a gene functional 
group or module, which is composed of cooperative 
genes in biological processes or pathways, can reveal the 
functional specificity of individual genes or network 
modules. On the other hand, nowadays, there is rich 
information on biological processes [6,7], but the infor- 
mation on biological processes generally lacks dynamic 
features even compared with pathways [8,9]. Hence, in 
this paper we intend to identify the sequential structure 
or cascade dynamics of biological processes during 
phase transitions by developing a general framework for 
gene-specific segmentation and temporal block network 
(or network module), in particular on when and what a 
biological process or function will be cooperatively 
facilitated by network modules (or gene modules) during 
a phase transition. Note that, in the previous studies, the 
term "dynamic biological process" was usually used to 
refer to the dynamics of some general biological func- 
tional work-flow rather than sequential dynamics of bio- 
logical processes or pathways [10-12]. In contrast, our 
work focuses on studying conditional and time-depen- 
dent behaviours or sequential dynamics of network 
modules, which are functionally enriched on specific 
biological processes [13]. 

The rapid accumulation of temporal gene expression 
data provides us the opportunity to unveil mechanisms 
of dynamic processes behind phenotype changes. In par- 
ticular, a recent work shows that temporal dynamical 
model has ability to detect the presence and absence of 
stage/phase specific biological processes in Yeast cell 
cycle and metabolic cycle [13]. But, this model is limited 



to the analysis on the time segmentation for all genes, 
by simply using the replicated observations to infer bio- 
logical processes' temporal coordination. To overcome 
this problem, a new bicluster-based temporal segmenta- 
tion method in this paper is developed to build a causal 
process model (CPM) for identifying the temporal fea- 
tures of biological processes during genotype or system 
reorganizations. In addition to biological processes and 
pathways, network modules or protein complexes [14] 
are used to further illustrate the sequential dynamics of 
biological systems as the molecular basis of those func- 
tional temporal features. Actually, protein modules or 
protein complexes have been found to play many 
important roles in biological phase changes, such as, 
indicator of genetic effect during mammary gland onco- 
genesis [15], marker of cancer diagnosis and prognosis 
[16], predictor of genotype-phenotype associations 
[17,18], and responser of dynamic cues from the envir- 
onment [19]. 

In summary, the construction of our causal process 
model (CPM) includes three steps. First, we identify spe- 
cific biclusters with linear patterns, and assemble them 
into temporal blocks representing a group of genes and 
their time segmentations. Then, each temporal block is 
refined by conducting functional enrichment analysis. 
Finally, we infer the sequential or cascade (causal) rela- 
tions between temporal blocks by a graphical model (e. 
g., partial correlation) among two groups of genes. 
Through various experiments, we demonstrate the effect 
of our method on gene-specific temporal segmentation. 
In particular, on Yeast cell cycle data, we show that the 
phase division based on CPM is more efficient and 
effective than the segmentation based on traditional 
CCC-biclustering method [20]; and in the analysis of 
phase/cell cycle related biological processes, we found 
that the group of genes actually displays conditional 
functional enrichment and protein interaction network 
rewiring. All those results show that CPM is indeed able 
to unveil the biological mechanism behind complicated 
phase transitions. 

Method 

Causal process model: temporal block based on 
biclusters' assembler 

Unlike traditional time segmentation methods requiring 
the same division on a time period for all genes [13] 
(see Figure 1 (A)), the gene-specific time segmentation 
is considered in the present work. That means, for dif- 
ferent genes or gene groups, they can have different cor- 
responding time segmentations based on their 
expressions, which can be considered as a general fra- 
mework without the uniform division constraint. This is 
why the biclustering methodology [21,22] (see Figure 1 
(B)), which can group genes and conditions 
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Figure 1 Different schemes for time segmentation 



simultaneously, is adopted. However, as discussed in the 
study of temporal dynamic model [13], state-of-the-art 
CCC-biclustering method [20] has the limitation that it 
usually cannot cover all/most genes and time points. To 
overcome this problem, an in-house biclustering method 
(noted as EBB: Error-Bounded Biclustering) is used to 
enumerate so-called error-bounded linear patterns, e.g. 
traditional shifting pattern and scaling pattern [22], 
which can model a group of genes having similar 
expression change tendency, and further assemble them 
into the proposed temporal blocks by estimating the fol- 
lowing boundary genes. 

The brief framework of EBB includes three main steps: 
(1) discretizing the raw data matrix to a 0-1 matrix by a 
referred element in data matrix and a given error 
bound; (2) building a suffix tree based on 0-1 sequences 
encoded by rows in the above 0-1 matrix where '0' 
represents left child node and '1' represents right child 
node; (3) identifying the deepest right-only node in the 
suffix tree as a potential bicluster with error-bounded 
linear pattern. In fact, CCC-biclustering is also an 
exhaustive method [20], but it adopts a significant trend 
filtering to handle with the data pre-processing and 
thereby cannot guarantee to find all potential scaling 
patterns/linear patterns. This problem leads to loss of 
most low-signal patterns and some important expression 
patterns (e.g. linear patterns), which prohibit method 
itself to explore whole information of data. On the other 
hand, EBB method seeks linear patterns covering tradi- 
tional shifting/scaling patterns [22] so that it can iden- 
tify all interesting expression patterns in theory. Besides, 
EBB can also keep low-varying signals as many as possi- 
ble because it uses the error bound but not the ten- 
dency bound to discrete the raw data. 

As well known to us, biclusters represent similar 
expression behaviors of a group of genes at the same 
time points. However, our temporal block gathers those 
genes with the cooperative expression change during a 
specific time period, i.e. find those genes which simulta- 
neously obtain or lose similar expression with their part- 
ner genes. Qualitatively, a temporal block is a sub- 



Time points 

(C) 



matrix in the original data to cover the complete biclus- 
ters as many as possible but split the known biclusters 
as few as possible. According to the following concepts 
and definitions, the genes on so-called temporal bound- 
ary are used to divide the whole data matrix into differ- 
ent matrices named as temporal blocks (see Figure 1 
(Q). 

Definition 1 (Boundary gene and set) Given a data 
matrix D = {d m:n } m ^ Iin ^j, let a set of gene expression pat- 
terns as biclusters {Pj = {{Gi, Tj)|Gj c J, Tj C /}}f 1 , 
Then, a gene g in I is on the temporal boundary at time 
point t in J only when its R value is larger than a given 
threshold 6 with default value as one, where R is calcu- 
lated as formula (1). And all boundary genes at every 
time point consist of a boundary set {BG(t) = {g\R(g, t) 
>0.£ei}} te/ . 

\{Ti\g e G„ t e T„ t ft min leT| T}| 

Definition 2 (Temporal block) Given a matrix data 
D = {d min } m ^/ in ^f and its boundary set BG, the temporal 
block Bi = {(Gi, Ti) | Gi £ I, T t £ /} should satisfy following 
conditions: 

(a) Vg e G„g e BG(min reTj r) 

(b) Vg € Gi,g € J-BG(min TeTl r - l)or 
min re x,r = min re /r 

(c) Vg e Gi,g € / - BG(msx T€Tj x)or 
maxjg^r = max re ;T 

(d) Vg e d,g € BG(max ieTj r + l)or 
max TeTj r = max Te ;r 

(e) VG £ Gi, T c: T b (G, T) does not satisfy conditions 
(a)-(d); 

(f) VG Q I - Gi, T = T h (G, T) does not satisfy condi- 
tions (a)-(d). 

For convenience, minjg^r points the starting point 
or left-end of temporal block and rnaxj^ r points the 
ending point or right-end of temporal block, which are 
similar for temporal bicluster. Some additional differ- 
ences between the proposed temporal block and tradi- 
tional bicluster will be discussed in the next section. 



Zeng and Chen BMC Systems Biology 2012, 6(Suppl 1):S12 
http://www.biomedcentral.eom/1752-0509/6/S1/S12 



Page 4 of 14 



Causal process model: expansion of temporal block for 
functional enrichment analysis 

Like temporal segmentation, CPM gives a non-overlap- 
ping division on the whole data. It means that one gene 
within one time point at most belongs to one temporal 
block although this gene can belong to a different tem- 
poral block but at a different time, i.e. one temporal 
block cannot cover any other one in CPM. Taking Fig- 
ure 2 as an example, six genes {(g lt g 2 , g 3 , gt, gs, ge)} 
might have coherent expression on time points {(t 3 , £ 4 , 
t 5 , f 6 )}. In order to reflect the different gene reorganiza- 
tion events happening on time points t 2 and f 3 , these 
genes are divided into two temporal blocks during the 
co-expression period. This is just the over-division phe- 
nomenon in biclustering study which can supply a 
multi-granularity model for overlapping patterns [23]. 
When analyzing functional enrichment on temporal 
blocks, the over-divided genes should be gathered again. 
This can be easily achieved by the expansion of tem- 
poral blocks. 

Definition 3 (Expanded temporal block) Given a 
data matrix D = {<i m ,„} mE /,„ £ / and its temporal block B t 
= {G it Ti\Gi £ /, T t £ /}, the corresponding expanded 
temporal block B, = {Gi,Ti\Gi c J,T; c /} satisfies: 
B* = {G*, T*\G* c I G* 2 Gi, T* = T,}. Where, C x , y repre- 
sents the Pearson coefficient correlation between expres- 
sion profiles of two genes during the time period Tf, and 
p is a threshold with a default value as 0.8. 

Therefore, the temporal blocks are useful to give a 
global scheme of the data division, and the expanded 
temporal blocks are suitable to reflect the local property 
of large data. 

Causal process model: temporal block network 
construction based on partial correlation 

In order to extract the cascade dynamics of temporal 
blocks representing the sequential order of biological 
processes, there is a need to build a directed network 



among different temporal blocks whose qualitative con- 
nections are evaluated by the partial correlation [24]. It 
should be emphasized that, at present, our model con- 
cerns the linear relationship (i.e., linear pattern in tem- 
poral bicluster) so that the correlation but not mutual 
information is considered in relationship measurement. 
And to infer direct but not indirect correlation among 
genes, we adopted the partial correlation to measure 
association between two genes by removing the effect of 
their controlling genes. 

Definition 4 (Partial correlation) Given three gene 
expression profiles or vectors X, Y and Z, the partial cor- 
relation between X and Y under condition Z is calcu- 
lated as: 



PR{X, Y\Z) 



Cx,y — Cx,zCy,z 



1 



C 2 



1 



c 2 

W,z 



(2) 



where C.,. represents the Pearson coefficient correlation. 

Definition 5 (Link strength between temporal 
blocks) Given two temporal blocks B 1 = {G lt T{) and B 2 
= (G 2 , T 2 ), min TeTl r < min reT2 T < max reTl T + 1, 
these two blocks have a link with direction from B 1 to 
B 2 . The link strength between their referred gene expres- 
sion profiles in the time period 
[min TeTl r,min(max Te r 1 T ( max Te r 2 T)]c«« be calculated 
as: 



LS(Bi,B 2 



Exsc, maxy € G 2 (min ZsG2 , z?0 {,Y \PR{X, Y\Z) \ 



(3) 



This strength measurement indicates the potential 
partial relation from genes in a source block B^ to genes 
in a target block B 2 . It requires that the gene X in a 
source can directly interact with gene Fin a target (the 
correlation between X and Y is maximal as shown in 
the above definition), or be indirectly related to Y with- 
out the conduction from other target genes (the mini- 
mal partial correlation between X and Y under the 
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Figure 2 Illustration of temporal blocks based on the estimated boundary genes. 
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control of any Z is maximal as shown in above defini- 
tion). When the link strength is larger than a threshold 
with default value as 0.9, the connected temporal blocks 
are thought to have significant causal relation. 

Based on the links (edges) with strengths (weights) 
among temporal blocks (nodes), the temporal block net- 
work (TBN) is constructed for deep analysis on dynamic 
biological processes. And the execution program (CPM) 
for temporal blocks can be accessed from http://www. 
sysbio.ac.cn/ cb/chenlab/software.htm. 

Result and discussion 

There are different characteristics between the proposed 
temporal blocks and traditional biclusters. Due to the 
module-in-focus property of biclustering, biclusters 
always have overlap with each other and have less size 
(i.e., in terms of clusters) than the original data [20]. 
The redundancy elimination of those overlapped biclus- 
ters is still a relevant and open question in the study of 
biclustering. On the other hand, in the present work, 
CPM suffers from few effects of potential bicluster 
redundancy according to the principles of temporal 
block construction. In order to divide original time 
course data, the temporal blocks instead of biclusters 
are used to build the dynamic model constructed by 
boundary gene estimation so that any temporal block is 
not a traditional bicluster pattern but a bicluster assem- 
bler. In other words, a temporal block does not repre- 
sent the coherent expression solely as a bicluster but 
represents the similar expression pattern change events 
(condition (a) in Definition 2) as the concept of gene 
reorganization across the neighbouring time windows 
[13]. With the conditions (b), (c) and (d) in Definition 2, 
a temporal block can tolerate the so-called disorder per- 
iod, thereby allowing the boundary genes to present at 
consecutive time points located at left-end of temporal 
block. It can also allow the so-called asynchronous end- 
ing period, i.e. allow those genes not on temporal 
boundary when they present at right-end of temporal 
block or even allow them not belonging to any original 
bicluster pattern. Besides, temporal blocks also have 
completeness guaranteed by Conditions (e) and (f) in 
Definition 2. These advantages of temporal blocks all let 
them reasonably represent the non-overlapped sub- 
regions of the original whole data. 

For instance, in the matrix (with synthetic R values) of 
above Figure 2, an element in red representing its gene 
(row) is on the temporal boundary at its time point (col- 
umn); an element in blue means that its gene is not on 
the temporal boundary but at the starting time point of 
a few biclusters; an element in orange points that its 
gene is not at the starting time point of any biclusters 
yet. Therefore, the temporal block {(g 5 , g 6 ), (t 3 , t 4 , t 5 , t 6 )} 
is one without either disorder period or asynchronous 



ending period, while the temporal block {(gi, g 2 , g3, gd, 
(t 2 , h, t it f 5 , t 6 )} covers a disorder period because genes 
igi< gi> g-i) are at time points (t 2 , h) and an asynchro- 
nous ending period for genes (g 3 , g 4 ) being at time 
points (t 5 , t 6 ). 

Furthermore, the time cost of CPM is mainly on the 
computation of temporal block construction by tem- 
poral bicluster mining, which is similar to CCC-biclus- 
tering with a polynomial time complexity [20]. 

Gene-specific temporal segmentation by CPM shown on 
synthetic data 

First of all, we analyzed CPM on a synthetic data in a 
simple but typical strategy adopted in the previous stu- 
dies [23]. We produced a random data matrix with 10 
rows and 15 columns. Five predefined blocks or patterns 
with five genes and four consecutive time points were 
embedded into such a matrix. As the recovering pat- 
terns in the above synthetic data are perfect, we used a 
strict error bound as 0.0001 and minimum bicluster size 
as 3*3,3*4,4*3,4*4 respectively to run CPM method 
(hereafter, the annotation x*y means that one bicluster 
contains at least x genes and y time points). Under dif- 
ferent parameter settings, the divisions with temporal 
blocks on the whole synthetic data are shown in Figure 
3, where one temporal block is surrounded by a yellow 
box. We should emphasize two points on these results. 
One is, for the effect of minimum bicluster size setting, 
the biclusters with a shorter time period will lead to 
more sub-blocks due to over-division (3*3 in Figure 3 
(A) and 4*3 in Figure 3 (C)) than those with a longer 
time period (3*4 in Figure 3 (B) and 4*4 in Figure 3 
(D)), but all blocks are still reasonable and acceptable. 
The other is, according to the proposed design princi- 
ples, each temporal block can cover all time points of a 
predefined pattern and some asynchronous ending per- 
iod (e.g. cases shown in Figure 3), in order to tolerate 
the noise/error and divide the whole data in a unified 
way. Totally, CPM can simultaneously group genes and 
find gene-specific time divisions, which cannot usually 
be obtained by traditional time segmentation methods, 
and it can further split the whole data matrix into differ- 
ent sub-matrices, which is disregarded in many previous 
biclustering studies. 

Phase description by CPM comparing with CCC- 
biclustering based method 

Then, we analyzed CPM for the Yeast Cell Cycle of a- 
factor synchronization experiment of Spellman et al. 
[25]. This dataset comprises two cell cycles, with each 
cell cycle containing three phases as M/Gl, Gl&S, and 
G2&M [13,25]. Every phase crosses three time points in 
the experiment with a constant time interval as 7 min- 
utes. After using one-way ANOVA [26] to select genes 
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Figure 3 Temporal blocks on synthetic data according to CPM with different parameter settings. 



(i.e. setting the number of sample (time point) groups to 
be six with prior knowledge in six phases of two cell 
cycles, and the P-value to be based on the F-distribution 
with significant threshold as 0.05), remaining data 



denoted as YCC with 730 genes and 18 time points was 
used for further analysis. Again, we used different error 
bounds in {0.05, 0.1, 0.15, 0.2, 0.25} and minimum 
bicluster size as 10*5 (experience values in previous 
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study) to build CPMs on YCC data for extensive 
evaluations. 

As described before, the boundary genes can be used 
to trace the role-change events of a group of genes, and 
their number would increase greatly at a time point 
around the alternation of phases [13]. Due to the need 
to cover the possible disorder period, a few boundary 
genes are not effective on the temporal block construc- 
tion and others are just the refined boundary genes 
locating at the left-end (starting time point) of final tem- 
poral blocks. According to the statistic of temporal 
blocks and their depending boundary genes, Figure 4 
shows two kinds of distributions of boundary gene num- 
bers under different CPM parameter settings, where the 
dotted line represents the distribution of the original 
boundary genes and the solid line represents the distri- 
bution of the refined boundary genes. Obviously, the 
distributions of numbers of the refined boundary genes 
unveil more convincible phase related characteristics 
than those of the original boundary genes, thereby con- 
firming the effectiveness of the temporal blocks. Note 



that boundary genes mean the refined ones in the fol- 
lowing discussions. When the error bound is strictly set 
to 0.05, the peaks of distributions of boundary genes are 
always located at the middle time of each phase because 
genes try to keep their status of steady coordination 
(note that, the strictest parameter setting as 0.01 results 
in no bicluster output). When error bound is set to 0.1, 
the peaks of distributions of boundary genes are always 
located at the time point of a phase transition because 
genes usually start to cooperatively facilitate functions at 
this time and temporal block can cover the potential 
beginning disorder period. On the other hand, when an 
error bound is set to 0.15 or even a larger value, distri- 
butions of boundary genes cannot keep on their correla- 
tions with phases because many noises are introduced 
to mix up the genes on and not on temporal bound- 
aries. Therefore, CPM can directly use the distributions 
of boundary genes to trace the critical time points of 
phase transitions, whose dependent parameter setting 
will be estimated from both experience of data analyzers 
and pattern quality evaluation of biclustering. 




0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 

Time points 

Figure 4 Statistic view of boundary genes by CPM with different parameter settings. 
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In order to further confirm the efficiency of the pro- 
posed (EBB) bicluster-based segmentation method com- 
paring with other bicluster-based methods, we used 
temporal biclusters produced by CCC-biclustering [20] 
(under five different parameter settings and with 1.0 as 
the default value) to assemble temporal blocks again 
and re-analyzed the relations between developmental 
stages and distribution of boundary genes. Compared 
with Figure 4, the results shown in Figure 5 illustrate 
that CPM is more suitable on phase description than 
traditional temporal biclustering based segmentation. 
The further discussion on the differences between 
bicluster-based segmentation and traditional temporal 
segmentation is beyond the scope of this paper because 
they actually belong to two distinct methodology cate- 
gories like biclustering and clustering. 

Temporal trace identification by CPM with functional 
enrichment analysis 

According to the above discussion on parameter setting, 
we chose the temporal blocks obtained with the most 
suitable error bound setting as 0.1 to conduct the fol- 
lowing functional enrichment analysis [27]. In the tem- 
poral block expansion and temporal block network 
construction, the default thresholds were all used for 
calculation. 

Biological processes during phase transition revealed by 
CPM and comparison with temporal dynamical model 

Due to minimum length requirement of bicluster, the 
last four time points were not divided in our experi- 
ments. That is why we investigated the biological pro- 
cesses enriched in temporal blocks corresponding to 
the first phase and the latter two phases in each cell 



cycle, to compare with temporal dynamical model 
[13]. Similar to the previous studies, the circular pre- 
sence and absence of some biological processes in two 
cell cycles are shown in a chart as Figure 6. The 
obtained biological processes are close to those identi- 
fied by temporal dynamic model, such as amino acid 
biosynthetic process, cell wall chitin biosynthetic pro- 
cess, chromosome condensation and nucleosome 
assembly [13]. Therefore, CPM indeed can reveal bio- 
logical processes related to phase transitions, by ana- 
lyzing the phase segmentation and the temporal block 
network. 

It is worth noting that the potential causal relation 
between temporal blocks in CPM can further strengthen 
the cascade relation of phases belonging to intra- or 
inter-cell cycles. Figure 7 displays the whole temporal 
block network (where the edges between temporal 
blocks with same starting points were omitted so as to 
focus on the major asynchronous temporal relation), in 
which the nodes represent different temporal blocks 
denoted as {TB k }; the direct edges represent causal rela- 
tions; the node label shows the id of temporal block k 
and its time segmentation \f, t] in the form as "k - [f, t]"; 
and green, blue, yellow and pink nodes mean phase 
related, cell cycle related, cross-phases related and other 
kinds of temporal blocks, respectively. 

♦ Obviously, there are direct edges linking temporal 
block TB 63 to TB 77 , and temporal block TB 2 i to 
TB41. They are actually the phases' relations belong- 
ing to intra-cell cycle, which further confirm the 
phase related biological processes shown in the 
above Figure 6. 



800 




Boundary genes (CCC-0.001) 
Refined boundary genes (CCC- 
Boundary genes (CCC-0.01) 
Refined boundary genes (CCC- 
Boundary genes (CCC-0.1) 
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0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 

Time points 

Figure 5 Statistic view of boundary genes by CCC-biclustering based method with different parameter settings. 
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Figure 6 Biological processes with potential circular behaviour enriched in phase related temporal blocks 



♦ In all temporal blocks, only TB S2 directly connects 
TB 77 and TB 2 i, thereby acting as a bridge of (expres- 
sion) correlation between the last phases of the first 
cell cycle and the initial phase of the latter cell cycle. 
This means that CPM can also identify the phases' 
relations belonging to inter-cell cycle, and has the 
ability to infer cascade dynamics of biological func- 
tions like biological processes across multiple cell 
cycles. Note that, the previous temporal dynamic 
model needs multiple datasets to deduce causal rela- 
tion between biological processes [13], however, our 
CPM can infer meaningful functional cascade 
dynamics during biological transitions even on single 
dataset. At present, it is actually difficult to deeply 
discuss the biological processes not starting at a 
"check point" of some phase or cell cycle due to lack 
of relevant biological data, however, a few processes 
like protein-DNA complex assembly or nucleosome 
assembly enriched in temporal block TB 82 suggest 
that some of those functions will hold before enter- 
ing the next phase or cell cycle. 

♦ As the temporal dynamical model strongly shows 
the similarity of two cell cycles after a-factor hand- 
ling [13], CPM can even be used to elucidate the 
specificities for cell cycle related temporal blocks 



TB 39 and TB 2 o in Figure 7. These two cell cycle 
related temporal blocks (note that their functional 
analysis will be discussed in detail in next subsec- 
tions) have not direct edges between themselves, but 
they can also be directly connected by temporal 
block TB 82 again. This supports the need and impor- 
tance of novel temporal blocks across neighbouring 
functional periods which are modelled by the gene- 
specific temporal segmentation integrated in CPM. 

Functional enrichment variance during continuous cell 

cycles after a-factor treatment 

The l st cell cycle related temporal block TB 39 covers the 
former three phases with time points 0-8 and has 12 
genes expanded to 432 ones. On the other hand, the 2 nd 
cell cycle related temporal block TB 20 covers the latter 
three phases with time points 9-17 and has 42 genes 
expanded to 400 ones. For those two expanded gene 
sets, the significant phase (cell cycle)-related biological 
processes and pathways are listed in Table 1 and 2. 
Obviously, the \ st cell cycle related genes and 2 nd cell 
cycle related genes have shown several different biologi- 
cal processes annotated in GO [28], and the l st cell 
cycle related genes are frequently observed in biological 
pathways annotated in KEGG and Reactome [29,30]. 
Therefore such two cell cycles after a-factor treatment 




Figure 7 Temporal block network on YCC dataset 
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Table 1 Biological processes enriched in two cell cycles 
according to genes in TB 39 and TB 20 

Biological process cell 1 st cycle 2 nd 



✓ 
✓ 
✓ 
✓ 
✓ 
✓ 
✓ 
✓ 



✓ 
✓ 
✓ 
✓ 
✓ 



can be just thought as two super-phases with distinct 
dynamical properties, which is helpful to understand the 
cascade dynamics of complicated biological procedures 
across multiple phases or cycles. 

In addition, in order to re-validate the cell cycle spe- 
cificity on gene expression of such two temporal 
blocks, we used the genes in them to conduct hier- 
archical clustering with appropriate distance measure- 
ments [31] respectively on our analyzed dataset and 
other three independent Yeast gene expression datasets 
which also cover two cell cycles after the a-factor 
handling. They were downloaded from NCBI GEO 
with id GDS2318 [32] (one contributed dataset is 
denoted as YCC-gds2318) and GSE4987 [33] (two con- 
tributed datasets as dye-swap technical replicates are 
denoted as YCC-gse4987-53 and YCC-gse4987-35). On 
these four datasets YCC (Figure 8 (A)), YCC-gds2318 
(Figure 8 (B)), YCC-gse4987-53 (Figure 8 (Q) and 
YCC-gse4987-35 (Figure 8 (D)) respectively, the genes 
from TB 39 can correctly classify almost all time points 
into two cell cycles disregarding the effect of potential 
circular expression profiles in cell cycles. According to 

Table 2 Biological pathways enriched in two cell cycles 



according to genes in TB 39 and TB 20 

Pathway cell 1 st cycle 2 nd 

Amino sugar and nucleotide sugar metabolism ✓ ✓ 

Steroid biosynthesis ■/ 

Fructose and mannose metabolism •/ 

Regulation of beta-cell development ✓ 

Regulation of gene expression in beta cells ✓ 



Figure 9, genes from TB 20 also have good performance 
on clustering time points from different cell cycles. 
Considering the existence of missing expressions (filled 
with zero) of genes in other independent datasets, we 
only analyzed the molecular network behind such cell 
cycle specificity on our main YCC dataset in next 
subsection. 

Relation among modules and complexes in protein 
interaction network rewiring and temporal trace of 
biological phase transitions 

The co-expression network [34] was also used to reflect 
the potential cell cycle specificity after a-factor treat- 
ment through the rewired structures of the protein 
interaction network (PIN). Given a cell cycle related 
temporal block TB(G, T), we had a group of genes G 
and obtained the interactions of these genes' encoding 
proteins from STRING database [35]; with the informa- 
tion of Yeast protein subcellular localization [36] 
denoted as Yeast-eSLDB, we filtered the interaction by 
requiring that two proteins involved in one interaction 
must have a same candidate subcellular localization (this 
is because one protein may move to several subcellular 
localizations, and we only consider the location as 
"Nucleus", which has the most known protein mem- 
bers); based on these co-localization proteins' expression 
profiles in different cell cycles {Ti) i=lt2 (for some i, T = 
Ti), we calculated the Pearson coefficient correlation of 
two proteins with an interaction; combining the proteins 
and interactions with weights (or correlations), we 
extracted a PIN conducted co-expression network 
(PCCN). 

Thus, we used the genes in TB 39 and TB 2 o with their 
expression profiles during two cell cycles to build four 
PCCNs. They are denoted as {N^}, e {i,2),ce{i,2}, which 
mean that the genes/proteins in i cell cycle related 
temporal block have a rewired PCCN in actual c cell 
cycle. Figure 10(A)-(D) displays N{, Nf, N\ and Nf 
respectively. As the above discussion, Nj and N\ 
should indeed have specific network characteristic cor- 
responding to cell cycles. Generally, the genes repre- 
sented by nodes in light blue belong to TB 39 ; the 
genes represented by nodes in dark blue belong to 
TB 2 o; while genes represented by nodes in blue belong 
to the overlap of such two cell cycle related temporal 
blocks. Each interaction edge becomes from light & 
thin to dark & thick when its absolute weight (or cor- 
relation) increases. By network visualization of Cytos- 
cape [37], we easily observe the approximate network 
modules C 2 and C 3 in the Figure 10. The largest pro- 
tein complex Nucleosomal protein complex extracted 
from the information of Yeast protein complexes [38] 
denoted as Yeast-CYC is also highlighted as another 
module C 1 . It is interesting that three different 



mannose metabolic process •/ 

external encapsulating structure organization •/ 

cell wall organization or biogenesis ✓ 

cell wall organization •/ 

cellular cell wall organization or biogenesis / 

cellular cell wall organization •/ 

cytokinetic cell separation J 

cytokinesis, completion of separation •/ 

cytokinesis ✓ 

transition metal ion transport S 

iron ion transport •/ 

chromatin assembly ✓ 

nucleosome assembly •/ 
DNA conformation change 
DNA packaging 
chromatin assembly or disassembly 
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changes of network rewired profile correspond to the 
specificities of proteins in cell cycle related temporal 
blocks. 

♦ For proteins in TB 39 , they are densely connected 
to module C 3 in just the first cell cycle but not the 
second one; while C 3 always has fewer contacts 
with proteins in TB 2 o so that the presence and 
absence of relation with module C 3 would be a 
temporal trace for functional specificity in the first 
cell cycle. 

♦ For proteins in TB 39 or TB 2 o, they present strict 
interactions with module C 2 in the first cell cycle 
but lose such relation in following cell cycle. This 
means, in our mathematical model, TB 39 mainly cap- 
tures the presence of relation with C 2 while TB 2 o 
tends to mine the disappearance of relation with the 
same module. 

♦ Dissimilar from the above two conditions, protein 
complex Ci strengthens its relation with proteins in 
TB 2 o in just the second cell cycle but not the first 
one. Hence, the varying relation with protein 



complex Ci can be a candidate temporal trace for 
functional specificity in the second cell cycle. 

Therefore, attractively, protein interaction modules 
and their relations with other proteins above can be 
thought as the dynamical markers (or temporal traces) 
of cell cycles in phase transitions. The proposed tem- 
poral blocks with the causal process model are indeed 
effective to efficiently uncover such molecular basis of a 
biological transition. 

Conclusion 

To overcome the drawbacks of traditional time segmen- 
tation and temporal biclustering methods, the causal 
process model (CPM) was proposed to study the biolo- 
gical phase transitions in a systematic way. The experi- 
mental results validated that CPM can effectively 
identify gene-specific temporal segmentations by devel- 
oping a boundary gene estimation scheme, and effi- 
ciently infer the potential cascade dynamics of biological 
processes by constructing a temporal block network. 
CPM not only has identified the phase-specific dynamic 
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Figure 9 Hierarchical clustering of genes and time points on four independent datasets according to temporal block related to 2 nd 
cell cycle. 



biological processes which were found by the traditional along with the improvement of bicluster enumeration 

dynamic temporal model, but also revealed cell cycle and sparse causal network inference, the proposed CPM 

specific rewiring of the protein interaction network can both detect unknown phase transitions in real biolo- 

which was missed in the previous studies. All in all, gical systems, and identify the candidate functional 



Zeng and Chen BMC Systems Biology 2012, 6(Suppl 1):S12 
http://www.biomedcentral.eom/1752-0509/6/S1/S12 



Page 13 of 14 




cascade dynamics with temporal traces (or dynamical 
markers) during the transformation of a biological 
system. 
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